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1 A nnultimedia cognitive-based information retrieval systenn 98% 

D. Davcev , D. Cakmakov , V. Cabukovski 
— Proceedings of the 19th annual conference on Computer Science April 1999 



2 An application of a multimedia cognitive-based information retrieval 97% 
12 system (AMCIRS) in mineralogy 

Danco Davcev , Dusan Cakmakov 

Proceedings of the 1993 ACM conference on Computer science March 1993 
A Multimedia Cognitive-based Information Retrieval System called AMCIRS which 
integrates image and text information has been described in [11], [12], The AMCIRS 
query based mechanism is based on multimedia objects content search using the 
vector model. The content search process is deduced to the similarity estimation 
between query and index vectors. The main objective of this paper is to present an 
application of AMCIRS in Mineralogy. The experimental evaluati ... 



3 A cell-based index structure for similarity search in high-dimensional 97% 
U feature spaces 

Kwang-Taek Song , Hwa-Jin Nam , Jae-Woo Chang 

Proceedings of the 2001 ACM symposium on Applied computing March 2001 

4 Automatic text indexing using complex identifiers 96% 
Gerald Salton 

— Proceedings of the ACM conference on Document processing systems January 
2000 
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5 Using n-grams for Korean text retrieval 95% 

Joo Ho Lee , Jeong Soo Ahn 
— Proceedings of the 19th annual international ACM SIGIR conference on Research 

and development in Information retrieval August 1996 



6 Fast retrieval of cursive handwriting 94% 
Ibrahim Kamel 

Proceedings of the fifth international conference on Information and knowledge 
management November 1996 



7 Searching dynannically bundled goods with pairwise relations 93% 

Yuan-Chi Chang , Chung-Sheng Li , John R. Smith 

Proceedings of the 4th ACM conference on Electronic commerce June 2003 

Economics research has long recognized that bundling enables savings in production 
and transaction costs, promotes complementary among the bundle components and 
sorts consumers according to their valuations. Sellers employ market analysis and 
intelligence to extract the most surplus. In the age of electronic commerce with low 
product information access cost, buyers can take advantage of the benefits of 
bundling by performing dynamic composition of goods from multiple companies 
offering heterogen ... 



8 Indexing very high-dinnensional sparse and quasi-sparse vectors for 92% 
13 similarity searches 

Changzhou Wang , X. Sean Wang 

The VLDB Journal — The International Journal on Very Large Data Bases April 
2001 

Volume 9 Issue 4 

Similarity queries on complex objects are usually translated into searches among 
their feature vectors. This paper studies indexing techniques for very high- 
dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors 
each having only a small number (e.g., ten) of non-zero or significant values. Based 
on the R-tree, the paper introduces the xS-tree that uses lossy compression of 
bounding regions to guarantee a reasonable minimum fan-out within the allocated 
stora ... 



9 Experiments in retrieval of nnineral information 92% 

Dusan Cakmakov , Danco Davcev 
— Proceedings of the first ACM international conference on Multimedia September 

1993 



10 High performance clustering based on the similarity join 89% 

Christian Bohm , Bernhard Braunmuller , Markus Breunig , Hans-Peter Kriegel 
— Proceedings of the ninth international conference on Information and knowledge 
management November 2000 



11 Query processing in a heterogeneous retrieval network 88% 

p. Simpson 

— Proceedings of the 11th annual international ACM SIGIR conference on Research 
and development in information retrieval May 1988 

The concept of a large-scale information retrieval network incorporating 
heterogeneous retrieval systems and users is introduced, and the necessary 
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components for enabling term-based searching of any database by untrained end- 
users are outlined. We define a normal form for expression of queries, show that such 
queries can be automatically produced, if necessary, from a natural-language request 
for Information, and give algorithms for translating such queries, with little or no loss 
of expre ... 

Combining multiple evidence from different properties of weighting 87% 
schemes 

Joon Ho Lee 

Proceedings of the 18th annual international ACM SIGIR conference on Research 
and development in information retrieval July 1995 

13 Spatial indexing of high-dimensional data based on relative 86% 

12 approximation 

Yasushi Sakurai , Masatoshi Yoshikawa , Shunsuke Uemura , Haruhiko Kojima 

The VLDB Journal — The International Journal on Very Large Data Bases October 

2002 

Volume 11 Issue 2 

We propose a novel index structure, the A-tree (approximation tree), for similarity 
searches in high-dimensional data. The basic idea of the A-tree is the introduction of 
virtual bounding rectangles (VBRs) which contain, and approximate I^BRs or data 
objects. VBRs can be represented quite compactly and thus affect the tree 
configuration both quantitatively and qualitatively. First, since tree nodes can contain 
a large number of VBR entries, fanout becomes large, which increases search speed. 
More ... 



14 Information retrieval using a singular value decomposition model of 85% 
12 latent semantic structure 

G. W. Furnas , S. Deerwester , S. T. Dumais , T. K. Landauer , R. A. Harshman , L. A. 
Streeter , K. E. Lochbaum 

Proceedings of the 11th annual international ACM SIGIR conference on Research 

and development in information retrieval May 1988 

In a new method for automatic indexing and retrieval, implicit higher-order structure 
in the association of terms with documents is modeled to improve estimates of term- 
document association, and therefore the detection of relevant documents on the basis 
of terms found in queries. Singular-value decomposition is used to decompose a large 
term by document matrix into 50 to 150 orthogonal factors from which the original 
matrix can be approximated by linear combination; both documents and terms ... 

15 A belief network model for IR 82% 

Berthier A. N. RIbeiro , Richard Muntz 
— Proceedings of the 19th annual international ACM SIGIR conference on Research 
and development in information retrieval August 1996 



16 Online analytic processing: CMVF: a novel dimension reduction schenne 77% 
12 for efficient indexing in a large image database 

Jialie Shen , Anne H. H. Ngu , John Shepherd , Du Q. Huynh , Quan Z. Sheng 
Proceedings of the 2003 ACM SIGMOD international conference on on 
Management of data June 2003 

17 Parallel text search methods 7i% 
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Gerard Salton , Chris Buckley 
— Communications of the ACM February 1988 
Volume 31 Issue 2 

A comparison of recently proposed parallel text search methods to alternative 
available search strategies that use serial processing machines suggests parallel 
methods do not provide large-scale gains in either retrieval effectiveness or 
efficiency. 



18 Image Retrieval: Adaptive nearest neighbor search for relevance 7i% 
U feedback in large image databases 

p. Wu , B. S. Manjunath 

Proceedings of the ninth ACM international conference on Multimedia October 
2001 

Relevance feedback is often used in refining similarity retrievals in Image and video 
databases. Typically this involves modification to the similarity metrics based on the 
user feedback and recomputing a set of nearest neighbors using the modified 
similarity values. Such nearest neighbor computations are expensive given that 
typical image features, such as color and texture, are represented in high dimensional 
spaces. Search complexity is a ciritcal issue while dealing with large databases and ... 



19 Vector space nnodel of information retrieval: a reevaluation 65% 

S. K. M. Wong , Vijay V. Raghavan 
— Proceedings of the 7th annual international ACM SIGIR conference on Research 

and development in information retrieval July 1984 

In this paper we, in essence, point out that the methods used in the current vector 
based systems are in conflict with the premises of the vector space model. The 
considerations, naturally, lead to how things might have been done differently. More 
importantly, it is felt that this investigation will lead to a clearer understanding of the 
issues and problems in using the vector space model in information retrieval. 



20 Hypertext databases and data mining 64% 

Soumen Chakrabarti 

^ ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 
conference on Management of data June 1999 
Volume 28 Issue 2 

The volume of unstructured text and hypertext data far exceeds that of structured 
data. Text and hypertext are used for digital libraries, product catalogs, reviews, 
newsgroups, medical reports, customer service reports, and the like. Currently 
measured in billions of dollars, the worldwide internet activity is expected to reach a 
trillion dollars by 2002. Database researchers have kept some cautious distance from 
this action. The goal of this tutorial is to expose database researchers to t ... 
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21 Distance-based Indexing for high-dimensional metric spaces 

Tolga Bozkaya , Meral Ozsoyoglu 
^ ACM SIGMOD Record , Proceedings of the 1997 ACM SIGMOD international 
conference on Management of data June 1997 
Volume 26 Issue 2 

In many database applications, one of the common queries is to find approximate 
matches to a given query item from a collection of data items. For example, given an 
image database, one may want to retrieve all images that are similar to a given query 
image. Distance based index structures are proposed for applications where the data 
domain is high dimensional, or the distance function used to compute distances 
between data objects is non-Euclidean. In this paper, we introduce a distance bas ... 



61% 



22 Sequence Mining: Prefix-querying: an approach for effective 

12 subsequence matching undertime warping in sequence databases 

Sanghyun Park , Sang-Wook Kim , June-Suh Cho , Sriram Padmanabhan 

Proceedings of the tenth international conference on Information and knowledge 

management October 2001 

This paper discusses an index-based subsequence matching that supports time 
warping in large sequence databases. Time warping enables finding sequences with 
similar patterns even when they are of different lengths. In our earlier work, we 
suggested an efficient method for whole matching under time warping. This method 
constructs a multi-dimensional index on a set of feature vectors, which are invariant 
to time warping, from data sequences. For filtering at feature space, it also ap ... 



57% 



23 Hierarchical indexing and document matching in BoW 

Maayan Geffet , Dror G. Feitelson 

Proceedings of the first ACM/IEEE-CS joint conference on Digital libraries January 
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2001 

BoW is an on-line bibliographical repository based on a hierarchical c oncept index to 
which entries are linked. Searching in the repository should therefore return matching 
topics from the hierarchy, rather than just a list of entries. Likewise, when new 
entries are inserted, a search for relevant topics to which they should be linked is 
required. We develop a vector-based algorithm that creates keyword vectors for the 
set of competing topics at each node in the hierarchy, and show how its .,. 

24 Data integration using similarity joins and a word-based information 54% 
12 representation language 

William W. Cohen 

ACM Transactions on Information Systems (TOIS) July 2000 
Volume 18 Issue 3 

The integration of distributed, heterogeneous databases, such as those available on 
the World Wide Web, poses many problems. Merer we consider the problem of 
integrating data from sources that lack common object identifiers. A solution to this 
problem is proposed for databases that contain Informal, natural-language "names" 
for objects; most Web-based databases satisfy this requirement, since they usually 
present their information to the end-user through a veneer of text. We des ... 
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,()[ 


These characters end a text token. 


= > < ! 


These characters end a text token because they signify the 
start of a field operator. (1 is special: != ends a token.) 


' @\Q < 
{ [ ! 


These characters signify the start of a delimited token. 
These are terminated by the end character associated with 
the start character. 
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